Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / cocktail / doc.lha / doc.doc / ell.doc < prev next >

Wrap

Text File | 1992-09-25 | 44KB | 1,356 lines

___________________________________________________________________ Efficient and Comfortable Error Recovery in Recursive Descent Parsers J. Grosch ___________________________________________________________________ ___________________________________________________________________ GESELLSCHAFT FUeR MATHEMATIK UND DATENVERARBEITUNG MBH FORSCHUNGSSTELLE FUeR PROGRAMMSTRUKTUREN AN DER UNIVERSITAeT KARLSRUHE ___________________________________________________________________ Project Compiler Generation ___________________________________________________________ Efficient and Comfortable Error Recovery in Recursive Descent Parsers Josef Grosch Dec. 11, 1989 ___________________________________________________________ Report No. 19 Copyright c 1989 GMD Gesellschaft fuer Mathematik und Datenverarbeitung mbH Forschungsstelle an der Universitaet Karlsruhe Vincenz-Priesznitz-Str. 1 D-7500 Karlsruhe 1 Efficient and Comfortable Error Recovery in Recursive Descent Parsers Josef Grosch GMD Forschungsstelle an der Universitaet Karlsruhe Vincenz-Priesznitz-Str. 1, D-7500 Karlsruhe, Germany Tel: +721-6622-26 E-Mail: grosch@karlsruhe.gmd.de Abstract This paper describes the interesting features of the recursive descent parser generator Ell from the user's point of view. Some of the interesting implementation aspects are discussed. The generated parsers are extremely fast and run at speed of 55,000 tokens per second or 900,000 lines per minute on a MC 68020 processor. The outstanding features of Ell are the L-attribution mechanism, its ability to handle non LL(1) grammars, and its comfortable error handling, which includes error reporting, recovery, and repair. 1. Introduction Recursive descent parsing is an established technique for the analysis of LL(1) languages since many years. It can be implemented easily by a hand- written program or by using one of the existing parser generators such as [DuW81, Gra88a, ReM85]. In real life applications, issues such as good qual- ity error recovery and high run time performance are essential. These require- ments are rarely discussed in the literature and seldom achieved by the exist- ing parser generators. This explains the reasons for implementing yet another LL(1) parser generator and to describe some details of the generated code. This paper presents how comfortable features like error recovery, error repair, and L-attribution are implemented in the high performance parsers gen- erated by the parser generator Ell [Gro88, GrV88]. Ell generates recursive descent parsers from LL(1) grammars given in extended BNF. The grammar rules may be associated with semantic actions consisting of arbitrary statements which are executed whenever they are passed during a left-to-right parse. An L-attribution may be evaluated during parsing. Ell offers possibilities to resolve the LL(1) conflicts of non LL(1) grammars. The generated parsers automatically include error reporting, recovery, and repair. Parsers can be generated in the target languages C and Modula-2. The parsers are extremely fast and run at speed of 55,000 tokens per second or 900,000 lines per minute on a MC 68020 processor. This paper addresses only the interesting features of Ell. The mechanism for L-attribution and the error recovery are described from the user's point of view as well as from the implementation view. We also present how LL(1) conflicts are resolved and discuss further implementation issues such as test- ing for set membership or the pitfalls of CASE (switch) statements. The exam- ples are taken from a parser for Modula-2 and also use Modula-2 as target language. 2 2. L-Attribution According to [Wil79] an attribute grammar which can be evaluated during LL(1)-parsing is called an L-attributed grammar. The notion L-attribution means that all attributes can be evaluated in a single top-down left-to-right tree walk. 2.1. User's View The specification language of Ell distinguishes three kinds of grammar symbols: nonterminals, terminals, and literals. Literals are similar to termi- nals and are denoted by strings. Terminals and nonterminals are denoted by identifiers. Terminals and nonterminals can be associated with arbitrary many attributes of arbitrary types. The computation of the attribute values takes place in the semantic action parts of a rule. The attributes are accessed by an attribute designator which consists of the name of the grammar symbol, a dot character, and the name of the attribute. As several grammar symbols with the same name can occur within a rule, the grammar symbols are denoted unambi- guously by appending numbers to their names. The left-hand side symbol always receives the number zero. For every (outermost) alternative of the right-hand side, the symbols with the same name are counted starting from one. Example: expr : ( [ '+' ] term { expr0.value := term1.value; } | '-' term { expr0.value := - term2.value; } ) ( '+' term { INC (expr0.value, term3.value); } | '-' term { DEC (expr0.value, term4.value); } ) * . term : fact { term0.value := fact1.value; } ( '*' fact { term0.value := term0.value * fact2.value; } | '/' fact { term0.value := term0.value DIV fact3.value; } ) * . fact : const { fact0.value := const1.value; } | '(' expr ')' { fact0.value := expr1.value; } . The above example specifies the evaluation of simple arithmetic expres- sions using one attribute called value. The operator "*" denotes repetition zero, once, or more times. The attributes have to be declared as members of a record type called tParsAttribute. Example: TYPE tParsAttribute = RECORD value: INTEGER; END; 3 2.2. Implementation The implementation of the L-attribution in the generated parsers is very simple. As usual, every nonterminal is analyzed by a procedure. Every pro- cedure has one (reference) parameter referring to the left-hand side attri- butes. As all attributes are declared as members of one (record) type, one parameter suffices to pass an arbitrary number of attributes. For all right- hand side symbols with attributes, local variables are declared. This solu- tion provides very efficient stacking of attributes via the usual procedure call mechanism. Stacking is necessary for the attribute evaluation of recur- sive grammar rules. Example: PROCEDURE expr (...; VAR expr0: tParsAttribute); VAR term1, term2, term3, term4: tParsAttribute; BEGIN ... END expr; 3. Non LL(1) Grammars Sometimes grammars do not obey the LL(1) property. They are said to con- tain LL(1) conflicts. A well-known example is the dangling-else problem of Pascal: in case of nested it-then-else statements it may not be clear to which IF an ELSE belongs. It is very easy to solve this conflicts in hand-written solutions. Ell handles LL(1) conflicts in the following ways: - Several alternatives (operator |) cause a conflict if their FIRST sets are not disjoint: the alternative given first is selected. - An optional part (operators [] and *) causes a conflict if its FIRST set is not disjoint from its FOLLOW set: the optional part will be analyzed because otherwise it would be useless. - Parts that may be repeated at least once cause a conflict if their FIRST and FOLLOW sets are not disjoint (as above): the repetition will be con- tinued because otherwise it would be executed only once. With the above rules it can happen that alternatives are never taken or that it is impossible for a repetition to terminate for any correct input. These cases as well as left recursion are considered to be serious design faults in the grammar and are reported as errors. Otherwise LL(1) conflicts are resolved as described above and reported as warnings. 4. Error Recovery 4.1. User's View The generated parsers include information and program code to handle syn- tax errors completely automatically and provide expressive error reporting, recovery, and repair. Every incorrect input is "virtually" transformed into a 4 Source Program: MODULE test; BEGIN IF (a = ] 1 write (a) END; END test. Error Messages: 3, 12: Error syntax error 3, 12: Information expected symbols: Ident Integer Real String '(' '+' '-' '{' 'NOT' 3, 14: Information restart point 3, 16: Error syntax error 3, 16: Information restart point 3, 16: Repair symbol inserted : ')' 3, 16: Repair symbol inserted : 'THEN' Fig. 1: Example of Automatic Error Messages syntactically correct program with the consequence of executing only a "correct" sequence of semantic actions. Therefore the following compiler phases like semantic analysis don't have to bother with syntax errors. Ell provides a prototype error module which prints messages as shown in Figure 1. Appendix 4 contains a larger example demonstrating the behaviour of our method. Internally the error recovery works as follows: - The location of the syntax error is reported. - If possible, the tokens that would be a legal continuation of the program are reported. - The tokens that can serve to continue parsing are computed. A minimal sequence of tokens is skipped until one of these tokens is found. - The recovery location (restart point) is reported. - Parsing continues in the so-called repair mode. In this mode the parser behaves as usual except that no tokens are read from the input. Instead a minimal sequence of tokens is synthesized to repair the error. The parser stays in this mode until the input token can be accepted. The synthesized tokens are reported as inserted symbols. The program can be regarded as repaired, if the skipped tokens are replaced by the synthesized ones. Upon leaving repair mode, parsing continues as usual. 4.2. Implementation During LL(1) analysis the following kinds of syntax errors can occur: at the analysis of a terminal the current (look-ahead) token can be different from the expected terminal. At the analysis of alternatives the current token can be member of none of the FIRST sets of the possible branches. At the analysis of optional or iterated parts the current token can be member of nei- ther the FIRST set nor the FOLLOW set of the construct. In order to achieve good quality error recovery, the latter test has to be performed before the 5 analysis of an optional part and before every iteration. This section discusses the interesting aspects of error recovery: how the sets of expected tokens are defined, how parsing is continued after syntax errors, how error repair works, and how an efficient implementation is achieved. 4.2.1. Expected Symbols At the location of a syntax error, we try to report the set of expected tokens. To be able to report the exact set of expected tokens, syntax errors have to be detected as early as possible. Furthermore, information must be maintained during parsing because the exact sets depend on the dynamic call hierarchy. Early detection of errors requires the knowledge of the exact FOL- LOW sets which also depend on the call hierarchy. In order to gain effi- ciency, we report only the subset of the expected tokens which can be computed at generation time. This set is necessary for every potential error location. In case of a terminal the expected token is just this terminal. In case of alternatives the set of expected tokens is the union of the FIRST sets of the alternatives. In case of optional or iterated parts the set of expected tokens is the union of the FIRST set and of the local FOLLOW set of the con- struct. All the sets are computed at generation time and stored in the gen- erated parsers. 4.2.2. Recovery Sets For every possible syntax error a so-called recovery set is determined containing the tokens where parsing can continue. We present the definition of recovery sets for plain BNF, first. Appendix 3 shows the extension to extended BNF using an attribute grammar formalism. In case of a syntax error the situation is as follows (see Fig. 2): - Analysis of the productions p1, . . . , pn has started. - Every production pi is processed by a procedure called Xi. - Every procedure Xi calls a procedure Xi+1 to analyze a nonterminal of the right-hand side. - Procedure Xn detects an error at position Z. - The current call hierarchy is X1, . . . , Xn. 6 p1: X1 -> Y11 ... X2 Y1j1 ... Y1n1 p2: X2 -> Y21 ... X3 Y2j2 ... Y2n2 ... pn-1: Xn-1 -> Yn-1,1 ... Yn-1,in-1 Xn Yn-1,jn-1 ... Yn-1,nn-1 pn: Xn -> Yn1 ... Ynin Z Ynjn ... Ynnn Fig. 2: Situation in case of a syntax error (pi P, Xi N, Yij V, Z V) Let us concentrate first on the situation locally in production pn. Parsing could continue at the symbols Ynjn, . . . , Ynnn. The set of tokens that allow to continue parsing, or in other words the local recovery set Rn is therefore the union of the FIRST sets of Ynjn, . . . , Ynnn: Rn = k=jnUnn FIRST (Ynk) However, in general there may be no token behind the location of the error which is member of the local recovery set Rn. Therefore, the productions pn-1, . . . , p1 have to be taken into account, too. Parsing can continue at all symbols not analyzed yet: Yn-1,jn-1 ... Yn-1,nn-1 ... Y2j2 ... Y2n2 Y1j1 ... Y1n1 Therefore in general we need all local recovery sets Ri: Ri = k=jiUni FIRST (Yik) i = 1, . . . , n The global recovery set R is the union of all involved local recovery sets: R = i=1Un Ri 7 The global recovery set R is used to stop skipping of tokens. Tokens are skipped until one is reached that is member of R. This location is called a restart point. This method of recovery terminates because only symbols not yet analyzed are considered as potential restart points. Their processing is con- tained in the current call hierarchy. In extreme cases the complete rest of the input is skipped. To guarantee termination of skipping, a special token for end of file (sEof) has to be member of R. This is assured by augmenting the grammar by the following rule: p0: X0 -> X1 sEof where X1 is the original and X0 is the new start symbol of the grammar. The definition of R can be modified in some ways. First, if the symbol Z is a terminal it can be included in R in order to improve the behaviour of error recovery in case of superfluous tokens. Second, it is possible to exclude some elements (except sEof) from R without causing the recovery to fail. However, its behaviour becomes more coarse because eventually more tokens are skipped. If only sEof remains in R, we arrive at panic mode: after skipping the rest of the input no more errors can be found. The computation of the global recovery set R can consume a considerable amount of run time. For efficiency reasons, we found the following solution quite satisfactory: The local recovery sets can be computed at generation time. For every right-hand side symbol (or position) a local recovery set is computed and stored. The global recovery set has to be computed at run time, as it depends on the current call hierarchy. The union of the local recovery sets is computed only in the case of an error. As long as there is no error, it suffices to maintain a simple data structure that allows to compute the union on demand. If the local recovery sets are stored in an array of sets it is enough to know the index of a set in this array. The global restart set is represented as a list of such indices. This list allows to effectively compute the global restart set whenever needed. A list representing the global restart set is passed via a second parame- ter to every procedure. In case of an error, the local recovery set is added to the list and then the real set union is accomplished. Before calling another procedure to analyze a nonterminal, the list is extended by a suitable element. The list elements are declared as local variable of the procedures. This way, allocation and deallocation of storage is for free. For details see Fig. 3 as well as Appendix 1. 4.2.3. Error Repair Every incorrect program is repaired (or transformed) into a syntactically correct program. Error repair is accomplished by skipping tokens as described above and by imaginary inserting tokens. This insertion is realized relatively easy by continuing parsing as nothing would have happened. Whenever a terminal is expected which is different from the current input token, it is reported as inserted. Whenever alternatives are analyzed and the current input token is 8 TYPE tUnionPtr = POINTER TO tUnion; TYPE tUnion = RECORD (* type for list elements *) GlobalRecoverySet : tUnionPtr; LocalRecoverySet : SHORTCARD; END; (* procedure for a nonterminal *) PROCEDURE Module (GlobalRecoverySet: tUnionPtr; VAR Module0: tParsAttribute); VAR Union : tUnion; Ident1: tScanAttribute; Block1: tParsAttribute; ... BEGIN Union.GlobalRecoverySet := GlobalRecoverySet; IF Token # sMODULE THEN (* analysis of a literal *) RecoveryLiteral (sMODULE, 138, GlobalRecoverySet); ELSE Token := GetToken (); IsRepairMode := FALSE; END; IF Token # sIdent THEN (* analysis of an attributed terminal *) RecoveryTerminal (sIdent, 138, GlobalRecoverySet, Ident1); ELSE Ident1 := Attribute; (* receive attribute from scanner *) Token := GetToken (); IsRepairMode := FALSE; END; Union.LocalRecoverySet := 57; (* analysis of a nonterminal *) Block (SYSTEM.ADR (Union), Block1); END Module; Fig. 3: Scheme of the Code for Error Recovery member of none of the possible FIRST sets, an arbitrary branch, which is non- recursive, is selected. The restriction to non-recursive branches is necessary to guarantee termination. There always exists a non-recursive branch as long as the grammar is reduced. Semantic actions are executed during error repair as usual. Error recovery and error repair is combined in mainly three procedures to handle literals, terminals, and alternatives or EBNF constructs, respectively (see Appendix 2). To avoid superfluous error messages the parser knows two modes. In normal mode errors are reported, tokens are skipped, and the mode is changed to repair mode. In repair mode neither errors are reported nor tokens are skipped. Instead the inserted tokens are reported. The mode is switched back to normal when the analysis of a terminal is successful. Then error recovery is finished and parsing continues normally. The local variable Union is a list element for the representation of the global recovery set. It suffices to append this element to the list once in every procedure. Before calling another procedure to analyze a nonterminal the index of a local recovery set is assigned to the list element (here: 57). The 9 extended list is passed as parameter. The procedures for error recovery called ErrorRecovery, RecoveryTerminal, and RecoveryLiteral are given in Appendix 2. They are called only in case of syntax errors. As errors are con- sidered to be a rare event, these procedures do not have to care about effi- ciency. Their parameters describe the expected token (here: sMODULE and sIdent), the index of the local recovery set (here: 138), and the global recovery set. For attributed terminals a fourth parameter specifies the vari- able receiving the attributes from the scanner. The interface to the scanner consists primarily of the following objects: the procedure GetToken returns the next input token, the global variable Attribute contains the attributes of the current token, and the procedure ErrorAttribute (see Appendix 2) is called by the parser to get attribute values for synthesized tokens. 5. Related Work Much work has been published about error recovery. We limit this discus- sion on giving reasons for our solution and mention a few similar methods. The design of our error recovery was guided by the following requirements: - automatic derivation of error recovery from the grammar - efficient parsing in terms of run time - provision of error repair Efficiency implies a backtrack-free strategy and asks for a clever imple- mentation. Automatic derivation and a backtrack-free strategy imply more or less the definition of recovery sets as given above. We consider error repair to be important because error recovery should not consider syntactical aspects only. Syntax analysis is usually combined with semantic analysis. Although error repair might not transform an error the way the programmer originally intended, it does transform every erroneous program into a syntactically correct one with the consequence that only syn- tactically correct information is passed to semantic analysis. This allows great simplifications in the latter because it does not have to care about syntax errors. The definition for the recovery sets given above is not new as it is somehow inherent in the problem. Similar definitions have been presented pre- viously e. g. by [Iro63, ReM85, SMM84]. Hand-written recursive descent parsers usually implement similar designs [Wir86]. The advantages of our solu- tion are the provision of error repair and the efficient implementation of error recovery. The compiler generators Coco [ReM85] and Coco/R [M\90] use the same strategy for error recovery but do not provide error repair. The effi- ciency of the parsers generated by Coco/R is comparable to the efficiency of our method. Using Coco/R, error recovery has to be tailored by giving simple directives. Ell does not require any user engagement and produces error recovery automatically. Appendix 4 presents the behaviour of our method applied to the example program used in [M\90]. The rather long listing of messages is the direct output of the information provided by the parser. The final layout of the messages can be easily adapted to the ideas of the user. 10 6. Implementation Issues Parsing is implemented using recursive procedures as outlined above. The operators of extended BNF are mapped to statements of the target language as given in Appendix 1. In obvious cases simple optimizations are exploited. For example the checks for terminals and literals can be omitted in some cases. More sophisticated implementation decisions concern the CASE/switch statements and the test for set membership. If C is used as target language, it turned out that many C compilers optimize switch statements in favour of storage. If the set of case labels is non-compact, a sorted list of values and addresses is generated. A run time system routine performs binary search in this table in order to map the current switch value to the address of a case branch. To trick the C compiler, Ell inserts dummy labels to make the set of case labels compact. Then the C compilers use a jump table which executes con- siderably faster. Generating compact sets of case labels improved the over all run time of the parsers by 30%. Sets are implemented as bit vectors if they contain more than one ele- ment. In general an array of memory words is needed to store the bits of one set (see Fig. 4). The membership test would be coded as follows: Seti: ARRAY [0..k] OF BITSET; e Seti _ (e MOD 32) IN Seti [e DIV 32] On a MC 68020 processor this produces the following six machine instructions including two divide instructions: movl _e:l,d1 divsll #0x20,d2:d1 movl _e:l,d3 divsl #0x20,d3 movl (_Set:l,d3:w:4),d3 btst d2,d3 It is much more advantageous to store the sets vertically instead of hor- izontally as above. 32 sets can be stored side by side in a sufficient number of words (see Fig. 5). The membership test is coded as follows: Fig. 4: horizontal set Fig. 5: vertical set 11 Set: ARRAY [0..m] OF BITSET; e Seti _ i IN Set [e] Note, that i is a constant. Now on a MC 68020 processor two machine instruc- tions suffice: movw _e+2:l,d4 btst #4,(_Set:l,d4:w:4) If the global variables e and Set could be stored in registers, the membership test could be done even with one machine instruction, only: btst #4,a0@(0,d0:l) This implementation of the membership test has been previously described in [Gra88b]. In our case, the two ways of the membership test account for a difference of another 30% in the over all run time of the parser. 7. Summary We presented the interesting features of the recursive descent parser generator Ell from the user's point of view and from the implementation aspects. The outstanding features of Ell are the L-attribution mechanism, its ability to handle non LL(1) grammars, and its automatic and comfortable error handling, which includes error reporting, recovery, and repair. The generated parsers are extremely efficient in terms of run time. For example a Modula-2 parser runs at a speed of 55,000 tokens per second or 900,000 lines per minute on a MC 68020 processor (excluding scanning). The size of the parser is 25 K bytes and the run time of the generator is 10 seconds. Acknowledgements Whereas the author designed the generated code and the error recovery, the generator Ell was programmed by D. Kuske. The implementation of the global recovery sets was discovered independently by D. Schwartz-Hertzner and D. Kuske. B. Vielsack added the generation of C code, the L-attribution mechan- ism, and the disambiguating rules for non LL(1) grammars. The implementation of the set membership test is due to W. M. Waite and B. Gray. References [DuW81] R. C. Dunn and W. M. Waite, SYNPUT, Department of Electrical Engineering, Univ. of Colorado, Boulder, CO, Feb. 1981. [Gra88a] R. W. Gray, Automatic Error Recovery in a Fast Parser, Summer USENIX Conference, , 1988. 12 [Gra88b] R. W. Gray, \-GLA - A Generator for Lexical Analyzers That Programmers can Use, University of Colorado, Boulder, 1988. [Gro88] J. Grosch, Generators for High-Speed Front-Ends, LNCS 371, (Oct. 1988), 81-92, Springer Verlag. [GrV88] J. Grosch and B. Vielsack, The Parser Generators Lalr and Ell, Compiler Generation Report No. 8, GMD Forschungsstelle an der Universitat Karlsruhe, Apr. 1988. [Iro63] E. T. Irons, An Error Correcting Parse Algorithm, Comm. ACM 6, 11 (Nov. 1963), 669-673. [M90] H. Mossenbock, Coco/R - A Generator for Fast Compiler Front-Ends, Report No. 127, Departement Informatik, ETH Zurich, Feb. 1990. [ReM85] P. Rechenberg and H. Mossenbock, Ein Compiler-Generator fur Mikrocomputer, Hanser, Munchen, 1985. [SMM84] M. Spenke, H. Muhlenbein, M. Mevenkamp, F. Mattern and C. Beilken, A Language Independent Error Recovery Method for LL(1) Parsers, Software- Practice & Experience 14, 11 (Nov. 1984), 1095-1107. [Wil79] R. Wilhelm, Attributierte Grammatiken, Informatik Spektrum 2, 3 (1979), 123-130. [Wir86] N. Wirth, Compilerbau, Teubner, Stuttgart, 1986. 13 Appendix 1: Scheme of the Code Generated for EBNF Constructs (* Literal t *) IF Token # t THEN RecoveryLiteral (t, Recover(t), GlobalrecoverySet); ELSE Token := GetToken (); IsRepairMode := FALSE; END; (* Terminal t (with attribute ti) *) IF Token # t THEN RecoveryTerminal (t, Recover(t), GlobalrecoverySet, ti); ELSE ti := Attribute; Token := GetToken (); IsRepairMode := FALSE; END; (* Nonterminal X (with attribute Xi) *) Union.LocalRecoverySet := Recover(X); X (SXSTEM.ADR (Union), Xi); LOOP (* Optional part X = [ Y ] *) IF Token FIRST (Y) THEN <Y> EXIT; ELSIF Token FOLLOW (X) OR IsRepairMode THEN EXIT; END; ErrorRecovery (Expected (X), Recover (X), GlobalRecoverySet); END; LOOP (* Iteration X = Y * or X = { Y } *) IF Token FIRST (Y) THEN <Y> ELSIF Token FOLLOW (X) OR IsRepairMode THEN EXIT; ELSE ErrorRecovery (Expected (X), Recover (X), GlobalRecoverySet); END; END; LOOP (* Iteration X = Y + or X = Y { Y } *) <Y> IF Token FIRST (Y) THEN IF Token FOLLOW (X) THEN EXIT; END; ErrorRecovery (Expected (X), Recover (X), GlobalRecoverySet); IF Token FIRST (Y) THEN EXIT; END; END; END; LOOP (* Iteration X = Y || Z or X = Y { Z Y } *) <Y> IF Token FIRST (Z) THEN IF Token FOLLOW (X) THEN EXIT; END; ErrorRecovery (Expected (X), Recover (X), GlobalRecoverySet); IF Token (FIRST (Y) U FIRST (Z)) THEN EXIT; END; END; <Z> END; LOOP (* Alternative = X = Y1 | ... | Yn *) CASE Token OF | FIRST (Y1) & FOLLOW (Y1): <Y1> EXIT; ... | FIRST (Yn) & FOLLOW (Yn): <Yn> EXIT; ELSE (* Yd (1 <= d <= n): default alternative for error case *) (* duplication of one of the above *) 14 IF IsRepairMode THEN <Yd> EXIT; END; ErrorRecovery (Expected (X), Recover (X), GlobalRecoverySet); END; END; 15 Appendix 2: Procedures for Error Recovery TYPE tUnionPtr = POINTER TO tUnion; TYPE tUnion = RECORD (* type for list elements *) GlobalRecoverySet : tUnionPtr; LocalRecoverySet : SHORTCARD; END; TYPE tSet = ARRAY [0..Upb] OF BITSET; (* type for bit sets *) VAR SetMemory : ARRAY [0..169] OF tSet; (* storage for horizontal sets *) (* test for set membership *) PROCEDURE IsElement (Set: tSet; Element: SHORTCARD): BOOLEAN; BEGIN RETURN Element MOD BitsPerBitset IN Set [Element DIV BitsPerBitset]; END IsElement; (* compute global recovery set and skip tokens *) PROCEDURE SkipTokens (LocalRecoverySet: SHORTCARD; GlobalRecoverySet: tUnionPtr); VAR RecoverySet: tSet; i: SHORTCARD; BEGIN RecoverySet := SetMemory [LocalRecoverySet]; INCL (RecoverySet [0], sEof); WHILE GlobalRecoverySet # NIL DO FOR i := 0 TO Upb DO RecoverySet [i] := RecoverySet [i] + SetMemory [GlobalRecoverySet^.LocalRecoverySet] [i]; END; GlobalRecoverySet := GlobalRecoverySet^.GlobalRecoverySet; END; WHILE NOT IsElement (RecoverySet, Token) DO Token := GetToken (); END; ErrorMessage (RestartPoint, Information, Line, Column); IsRepairMode := TRUE; END SkipTokens; PROCEDURE ErrorRecovery (ExpectedSet : SHORTCARD; LocalRecoverySet : SHORTCARD; GlobalRecoverySet: tUnionPtr); BEGIN IF NOT IsRepairMode THEN INC (ErrorCount); ErrorMessage (SyntaxError, Error, Line, Column); ErrorMessageI (ExpectedSymbols, Information, Line, Column, TokenSet, SYSTEM.ADR (SetMemory [ExpectedSet])); SkipTokens (LocalRecoverySet, GlobalRecoverySet); END; END ErrorRecovery; 16 PROCEDURE RecoveryTerminal (Expected : SHORTCARD; LocalRecoverySet : SHORTCARD; GlobalRecoverySet: tUnionPtr; VAR RepairAttribute: tScanAttribute); (* for terminals only *) BEGIN IF NOT IsRepairMode THEN INC (ErrorCount); ErrorMessage (SyntaxError, Error, Line, Column); ErrorMessageI (ExpectedSymbols, Information, Line, Column, Symbol, SYSTEM.ADR (Expected)); SkipTokens (LocalRecoverySet, GlobalRecoverySet); END; IF Token # Expected THEN ErrorMessageI (SymbolInserted, Repair, Line, Column, Symbol, SYSTEM.ADR (Expected)); ErrorAttribute (Expected, RepairAttribute); (* for terminals only *) ELSE RepairAttribute := Attribute; (* for terminals only *) IF Token # sEof THEN Token := GetToken (); END; IsRepairMode := FALSE; END; END RecoveryTerminal; PROCEDURE ErrorRecoveryLiteral (Expected : SHORTCARD; LocalRecoverySet : SHORTCARD; GlobalRecoverySet: tUnionPtr); (* like ErrorRecoveryTerminal except as marked above *) Appendix 3: Attribute Grammar to Compute the Recovery Sets N nonterminal E expression t terminal or literal R recovery set In, Out temporary attributes N = E {E.In := ; } E = t {E.R := FIRST (t) U E.In; E.Out := E.R; } E = N {E.R := E.In; E.Out := FIRST (N) U E.In; } E = [ E1 ] {E1.In := E.In; E.R := FIRST (E1) U E.In; E.Out := E.R;} E = E1 * {E1.In := E.In; E.R := FIRST (E1) U E.In; E.Out := E.R;} E = E1 + {E1.In := E.In; E.R := FIRST (E1) U E.In; E.Out := E.R;} E = E1 || E2 {E1.In := FIRST (E2) U E.In; E2.In := FIRST (E1) U E.In; E.R := FIRST (E1) U FIRST (E2) U E.In; E.Out := E.R;} E = E1 |...| En {E1.In := E.In; ... En.In := E.In; E.R := FIRST (E1) U ... U FIRST (En) U E.In; E.Out := E.R;} E = E1 E2 {E2.In := E.In; E1.In := E2.Out; E.R := E1.Out; E.Out := E.R;} 17 Appendix 4: Example of Error Recovery Source Program: 1 MODULE Error; 2 CONST M := 10, N = 100 X = 10; 3 VAR , a, b, c; 4 5 PROCEDURE P; 6 BEGIN 7 s := 0; a = 5 * (b - 1 END; 8 9 BEGIN 10 > a > b; 11 WHILE a DO 12 BEGIN > b; - c := 0; 13 WHILE a > 0 BEGIN 14 IF ODD a c := c * - b; 15 b := 2 * b a := a /2 16 END; 17 P := 0; P; 666; 18 END . Error Messages: 2, 9: Error syntax error 2, 9: Information expected symbols: '=' 2, 12: Information restart point 2, 12: Repair symbol inserted : '=' 2, 14: Error syntax error 2, 14: Information expected symbols: ';' 2, 16: Information restart point 2, 16: Repair symbol inserted : ';' 2, 25: Error syntax error 2, 25: Information restart point 2, 25: Repair symbol inserted : ';' 3, 5: Error syntax error 3, 7: Information restart point 3, 14: Error syntax error 3, 14: Information expected symbols: ':' 3, 14: Information restart point 3, 14: Repair symbol inserted : ':' 3, 14: Repair symbol inserted : Ident 7, 13: Error syntax error 7, 13: Information expected symbols: '(' ':=' ';' 7, 19: Information restart point 7, 26: Error syntax error 7, 26: Information restart point 7, 26: Repair symbol inserted : ')' 7, 29: Error syntax error 7, 29: Information expected symbols: Ident 7, 29: Information restart point 7, 29: Repair symbol inserted : Ident 18 10, 3: Error syntax error 10, 3: Information expected symbols: Ident ';' 'CASE' 'EXIT' 'FOR' 'IF' 'LOOP' 'REPEAT' 'RETURN' 'WHILE' 'WITH' 10, 5: Information restart point 10, 7: Error syntax error 10, 7: Information expected symbols: '(' ':=' ';' 10, 9: Information restart point 10, 9: Repair symbol inserted : ';' 12, 5: Error syntax error 12, 5: Information expected symbols: Ident ';' 'CASE' 'EXIT' 'FOR' 'IF' 'LOOP' 'REPEAT' 'RETURN' 'WHILE' 'WITH' 12, 13: Information restart point 12, 16: Error syntax error 12, 16: Information expected symbols: Ident ';' 'CASE' 'EXIT' 'FOR' 'IF' 'LOOP' 'REPEAT' 'RETURN' 'WHILE' 'WITH' 12, 18: Information restart point 13, 17: Error syntax error 14, 7: Information restart point 14, 7: Repair symbol inserted : 'DO' 14, 14: Error syntax error 14, 14: Information restart point 14, 14: Repair symbol inserted : 'THEN' 14, 16: Error syntax error 14, 16: Information restart point 14, 16: Repair symbol inserted : ';' 14, 25: Error syntax error 14, 25: Information expected symbols: Ident Integer Real String '(' '{' 'NOT' 14, 25: Information restart point 14, 25: Repair symbol inserted : Integer 15, 18: Error syntax error 15, 18: Information restart point 15, 18: Repair symbol inserted : ';' 17, 16: Error syntax error 17, 16: Information expected symbols: Ident ';' 'CASE' 'EXIT' 'FOR' 'IF' 'LOOP' 'REPEAT' 'RETURN' 'WHILE' 'WITH' 17, 19: Information restart point 18, 7: Error syntax error 18, 7: Information restart point 18, 7: Repair symbol inserted : 'END' 18, 7: Repair symbol inserted : 'END' 18, 7: Repair symbol inserted : Ident 1 Contents Abstract ........................................................ 1 1. Introduction .................................................... 1 2. L-Attribution ................................................... 2 2.1. User's View ..................................................... 2 2.2. Implementation .................................................. 3 3. Non LL(1) Grammars .............................................. 3 4. Error Recovery .................................................. 3 4.1. User's View ..................................................... 3 4.2. Implementation .................................................. 4 4.2.1. Expected Symbols ................................................ 5 4.2.2. Recovery Sets ................................................... 5 4.2.3. Error Repair .................................................... 7 5. Related Work .................................................... 9 6. Implementation Issues ........................................... 10 7. Summary ......................................................... 11 Acknowledgements ................................................ 11 References ...................................................... 11 Appendix 1: Scheme of the Code Generated for EBNF Constructs .................................................................... 13 Appendix 2: Procedures for Error Recovery ....................... 15 Appendix 3: Attribute Grammar to Compute the Recovery Sets ...... 16 Appendix 4: Example of Error Recovery ........................... 17